BioImageDbs 1.0.6
Last modified: 2021-10-14 03:35:36
Compiled: Thu Oct 14 03:36:45 2021
In recent years, there has been a growing need for data analysis using machine learning in the field of bioimaging research. Machine learning is an inductive approach using data, and the construction of models, such as image segmentation and classification, involves the use of image data itself. Therefore, the publication and sharing of bioimage datasets [1] as well as knowledge creation through providing metadata to bioimages [2,3] are important issues to be discussed. At present, there is no commonly used format for sharing bioimage datasets. Also, the data is scattered among various repositories. Therefore, different image repositories manage the data in different formats (image data itself and metadata, including image format, instruments/microscopes and biosamples).
In the data analysis and quantification using those images, it is assumed that several steps of image pre-processing are performed depending on the analysis environment used. However, the implementation of supervised learning starts with finding a repository of the bioimage dataset that contains original images and their corresponding supervised labels. Once the repository is found, the image data is downloaded from the repository, the data is loaded into each environment and it is prepared in a format suitable for analytical package. These processes are time consuming before the main analysis. Also, in most of the image repositories, the data are not published in a format suitable for reading and processing in R (.Rdata, etc.), and the data are not easy to use for R users.
For performing supervised learning of bioimage data, BioImageDbs provides R list objects of the original images and their corresponding supervised labels converted into a 4D or 5D array. After retrieving the data from ExperimentHub, it can be utilised for deep learning using Keras/Tensorflow [4] and other machine learning methods, without the need for pre-processing.
On the other hand, many image analysis packages are also available on R; however, there is a lack of standardisation in image analysis. The use of common, open datasets is one of the essential steps in standardising and comparing the analytical methods. The provision of the array data of images through ExperimentHub is also intended for applications such as (1) comparing models using common-sharing data among R users and (2) applying predictions to new datasets through transfer learning and fine-tuning based on these arrays.
The BioImageDbs package provides the metadata for all BioImage
databases in ExperimentHub.
The BioImageDbs package provides the metadata for bioimage datasets,
which is preprocessed as array format and saved in
ExperimentHub.
First we load/update the ExperimentHub resource.
library(ExperimentHub)
eh <- ExperimentHub()
Next we list all BioImageDbs entries from ExperimentHub.
query(eh, "BioImage")
## ExperimentHub with 46 records
## # snapshotDate(): 2021-05-18
## # $dataprovider: CELL TRACKING CHALLENGE (http://celltrackingchallenge.net/2...
## # $species: Homo sapiens, Mus musculus, Drosophila melanogaster
## # $rdataclass: List, magick-image
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["EH6851"]]'
##
## title
## EH6851 | EM_id0001_Brain_CA1_hippocampus_region_5dTensor.rds
## EH6852 | EM_id0001_Brain_CA1_hippocampus_region_5dTensor_train_dataset.gif
## EH6853 | EM_id0002_Drosophila_brain_region_5dTensor.rds
## EH6854 | EM_id0002_Drosophila_brain_region_5dTensor_train_dataset.gif
## EH6855 | LM_id0001_DIC_C2DH_HeLa_4dTensor.rds
## ... ...
## EH6892 | LM_id0003_Fluo_N2DH_GOWT1_5dTensor.Rds
## EH6893 | EM_id0003_J558L_4dTensor.Rds
## EH6894 | EM_id0003_J558L_4dTensor_train_dataset.gif
## EH6895 | EM_id0004_PrHudata_4dTensor.Rds
## EH6896 | EM_id0004_PrHudata_4dTensor_train_dataset.gif
We can confirm the metadata in ExperimentHub in Bioconductor S3 bucket
with mcols().
mcols(query(eh, "BioImage"))
## DataFrame with 46 rows and 15 columns
## title dataprovider species
## <character> <character> <character>
## EH6851 EM_id0001_Brain_CA1_.. https://www.epfl.ch/.. Mus musculus
## EH6852 EM_id0001_Brain_CA1_.. https://www.epfl.ch/.. Mus musculus
## EH6853 EM_id0002_Drosophila.. the ISBI 2012 Challe.. Drosophila melanogas..
## EH6854 EM_id0002_Drosophila.. the ISBI 2012 Challe.. Drosophila melanogas..
## EH6855 LM_id0001_DIC_C2DH_H.. CELL TRACKING CHALLE.. Homo sapiens
## ... ... ... ...
## EH6892 LM_id0003_Fluo_N2DH_.. CELL TRACKING CHALLE.. Mus musculus
## EH6893 EM_id0003_J558L_4dTe.. Pattern Recognition .. Mus musculus
## EH6894 EM_id0003_J558L_4dTe.. Pattern Recognition .. Mus musculus
## EH6895 EM_id0004_PrHudata_4.. Pattern Recognition .. Homo sapiens
## EH6896 EM_id0004_PrHudata_4.. Pattern Recognition .. Homo sapiens
## taxonomyid genome description coordinate_1_based
## <integer> <character> <character> <integer>
## EH6851 10090 NA 5D arrays with the b.. 1
## EH6852 10090 NA A animation file (.g.. 1
## EH6853 7227 NA 5D arrays with the b.. 1
## EH6854 7227 NA A animation file (.g.. 1
## EH6855 9606 NA 4D arrays with the m.. 1
## ... ... ... ... ...
## EH6892 10090 NA 5D arrays with the m.. 1
## EH6893 10090 NA The mouse B myeloma .. 1
## EH6894 10090 NA A animation file (.g.. 1
## EH6895 9606 NA The primary human T .. 1
## EH6896 9606 NA A animation file (.g.. 1
## maintainer rdatadateadded preparerclass
## <character> <character> <character>
## EH6851 Satoshi Kume <satosh.. 2021-05-18 BioImageDbs
## EH6852 Satoshi Kume <satosh.. 2021-05-18 BioImageDbs
## EH6853 Satoshi Kume <satosh.. 2021-05-18 BioImageDbs
## EH6854 Satoshi Kume <satosh.. 2021-05-18 BioImageDbs
## EH6855 Satoshi Kume <satosh.. 2021-05-18 BioImageDbs
## ... ... ... ...
## EH6892 Satoshi Kume <satosh.. 2021-05-18 BioImageDbs
## EH6893 Satoshi Kume <satosh.. 2021-05-18 BioImageDbs
## EH6894 Satoshi Kume <satosh.. 2021-05-18 BioImageDbs
## EH6895 Satoshi Kume <satosh.. 2021-05-18 BioImageDbs
## EH6896 Satoshi Kume <satosh.. 2021-05-18 BioImageDbs
## tags rdataclass
## <list> <character>
## EH6851 3D images,bioimage,CellCulture,... List
## EH6852 animation,bioimage,CellCulture,... magick-image
## EH6853 3D image,bioimage,CellCulture,... List
## EH6854 animation,bioimage,CellCulture,... magick-image
## EH6855 bioimage,cell tracking,CellCulture,... List
## ... ... ...
## EH6892 bioimage,cell tracking,CellCulture,... List
## EH6893 2D images,bioimage,CellCulture,... List
## EH6894 2D images,bioimage,CellCulture,... magick-image
## EH6895 2D images,bioimage,CellCulture,... List
## EH6896 2D images,bioimage,CellCulture,... magick-image
## rdatapath sourceurl sourcetype
## <character> <character> <character>
## EH6851 BioImageDbs/v01/EM_i.. https://github.com/k.. PNG
## EH6852 BioImageDbs/v01/EM_i.. https://github.com/k.. PNG
## EH6853 BioImageDbs/v01/EM_i.. https://github.com/k.. PNG
## EH6854 BioImageDbs/v01/EM_i.. https://github.com/k.. PNG
## EH6855 BioImageDbs/v01/LM_i.. https://github.com/k.. PNG
## ... ... ... ...
## EH6892 BioImageDbs/v01/LM_i.. https://github.com/k.. PNG
## EH6893 BioImageDbs/v01/EM_i.. https://github.com/k.. PNG
## EH6894 BioImageDbs/v01/EM_i.. https://github.com/k.. PNG
## EH6895 BioImageDbs/v01/EM_i.. https://github.com/k.. PNG
## EH6896 BioImageDbs/v01/EM_i.. https://github.com/k.. PNG
We can retrieve only the BioImageDbs tibble files as follows.
qr <- query(eh, c("BioImageDbs", "LM_id0001"))
qr
## ExperimentHub with 10 records
## # snapshotDate(): 2021-05-18
## # $dataprovider: CELL TRACKING CHALLENGE (http://celltrackingchallenge.net/2...
## # $species: Homo sapiens
## # $rdataclass: List, magick-image
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["EH6855"]]'
##
## title
## EH6855 | LM_id0001_DIC_C2DH_HeLa_4dTensor.rds
## EH6856 | LM_id0001_DIC_C2DH_HeLa_4dTensor_train_dataset.gif
## EH6857 | LM_id0001_DIC_C2DH_HeLa_4dTensor_Binary.rds
## EH6858 | LM_id0001_DIC_C2DH_HeLa_4dTensor_Binary_train_dataset.gif
## EH6859 | LM_id0001_DIC_C2DH_HeLa_5dTensor.rds
## EH6878 | LM_id0001_DIC_C2DH_HeLa_4dTensor.Rds
## EH6879 | LM_id0001_DIC_C2DH_HeLa_4dTensor_train_dataset.gif
## EH6880 | LM_id0001_DIC_C2DH_HeLa_4dTensor_Binary.Rds
## EH6881 | LM_id0001_DIC_C2DH_HeLa_4dTensor_Binary_train_dataset.gif
## EH6882 | LM_id0001_DIC_C2DH_HeLa_5dTensor.Rds
#Import data
#BioImageDbs_image_Dat <- qr[[1]]
The ordering of the array dimensions corresponds to the channels_last format (default) in R/Keras. The input shape of 5D array is to be batch, spatial_dim1, spatial_dim2, spatial_dim3 and channels. The number of this batch is the same as the number of the 3D image sets. The number of channels is 1 for grey images and 3 for RGB images.
The ordering of the array dimensions corresponds to the channels_last format (default) in R/Keras. The input shape of 4D array is to be batch, height, width and channels. The number of this batch is the same as the number of the 2D images.
As a test, we also provided gif files of some arrays for visualizations.
We visualize the files using magick::image_read function.
qr <- query(eh, c("BioImageDbs", ".gif"))
qr
## ExperimentHub with 20 records
## # snapshotDate(): 2021-05-18
## # $dataprovider: CELL TRACKING CHALLENGE (http://celltrackingchallenge.net/2...
## # $species: Homo sapiens, Mus musculus, Drosophila melanogaster
## # $rdataclass: magick-image
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["EH6852"]]'
##
## title
## EH6852 | EM_id0001_Brain_CA1_hippocampus_region_5dTensor_train_dataset.gif
## EH6854 | EM_id0002_Drosophila_brain_region_5dTensor_train_dataset.gif
## EH6856 | LM_id0001_DIC_C2DH_HeLa_4dTensor_train_dataset.gif
## EH6858 | LM_id0001_DIC_C2DH_HeLa_4dTensor_Binary_train_dataset.gif
## EH6861 | LM_id0002_PhC_C2DH_U373_4dTensor_train_dataset.gif
## ... ...
## EH6886 | LM_id0002_PhC_C2DH_U373_4dTensor_Binary_train_dataset.gif
## EH6889 | LM_id0003_Fluo_N2DH_GOWT1_4dTensor_train_dataset.gif
## EH6891 | LM_id0003_Fluo_N2DH_GOWT1_4dTensor_Binary_train_dataset.gif
## EH6894 | EM_id0003_J558L_4dTensor_train_dataset.gif
## EH6896 | EM_id0004_PrHudata_4dTensor_train_dataset.gif
#EM_id0001_Brain_CA1_hippocampus_region_5dTensor_train_data
qr[1]
## ExperimentHub with 1 record
## # snapshotDate(): 2021-05-18
## # names(): EH6852
## # package(): BioImageDbs
## # $dataprovider: https://www.epfl.ch/labs/cvlab/data/data-em/
## # $species: Mus musculus
## # $rdataclass: magick-image
## # $rdatadateadded: 2021-05-18
## # $title: EM_id0001_Brain_CA1_hippocampus_region_5dTensor_train_dataset.gif
## # $description: A animation file (.gif) of the train dataset of EM_id0001_Br...
## # $taxonomyid: 10090
## # $genome: NA
## # $sourcetype: PNG
## # $sourceurl: https://github.com/kumeS/BioImageDbs
## # $sourcesize: NA
## # $tags: c("animation", "bioimage", "CellCulture", "electron
## # microscopy", "microscope", "scanning electron microscopy",
## # "segmentation", "Tissue")
## # retrieve record with 'object[["EH6852"]]'
##Display the gif image
#magick::image_read(qr[[1]])
Figure 1: EM_id0001_Brain_CA1_hippocampus_region_5dTensor_train_dataset.gif
Figure 2: EM_id0002_Drosophila_brain_region_5dTensor_train_dataset.gif
Figure 3: EM_id0003_J558L_4dTensor_train_dataset.gif
Figure 4: EM_id0004_PrHudata_4dTensor_train_dataset.gif
Figure 5: EM_id0005_Mouse_Kidney_2D_All_Mito_1024_4dTensor_dataset.gif
Figure 6: EM_id0005_Mouse_Kidney_2D_All_Nuc_1024_4dtensor.Rds
Figure 7: EM_id0006_Rat_Liver_2D_All_Mito_1024_4dTensor_dataset.gif
Figure 8: EM_id0006_Rat_Liver_2D_All_Nuc_1024_4dTensor_dataset.gif
Figure 9: EM_id0007_Mouse_Kidney_MultiScale_All_Low_Glomerulus_1024_4dTensor_dataset.gif
Figure 10: EM_id0007_Mouse_Kidney_MultiScale_All_Middle_Podocyte_1024_4dTensor_dataset.gif
Figure 11: EM_id0008_Human_NB4_2D_All_Cel_512_4dTensor_dataset.gif
Figure 12: EM_id0008_Human_NB4_2D_All_Nuc_1024_4dTensor_dataset.gif
Figure 13: EM_id0009_MurineBMMC_All_512_4dTensor_dataset.gif
Figure 14: EM_id0010_HumanBlast_All_512_4dTensor_dataset.gif
Figure 15: EM_id0011_HumanJurkat_All_512_4dTensor_dataset.gif
Figure 16: LM_id0001_DIC_C2DH_HeLa_4dTensor_train_dataset.gif
Figure 17: LM_id0002_PhC_C2DH_U373_4dTensor_train_dataset.gif
Figure 18: LM_id0003_Fluo_N2DH_GOWT1_4dTensor_train_dataset.gif
We select a data array and a label array from the data list and assign them to variables. These variables are then used as the x and y arguments of the fit (<keras.engine.training.Model>) function of Keras as an example. The model in Keras should be prepared before the execution.
## Not Run ##
# qr <- query(eh, c("BioImageDbs"))
# BioImageData <- qr[[1]]
# data <- BioImageData$Train$Train_Original
# labels <- BioImageData$Train$Train_GroundTruth
# dim(data); dim(labels)
# model %>% fit( x = data, y = labels )
For this dataset in BioImageDbs, the published open data was used as follows:
For cellular ultra-microstructures, electron microscopy-based imaging data of mouse B myeloma cell line J558L (ex. EM_id0003_J558L_4dTensor.Rda) [5] and primary human T cell isolated from peripheral blood mononuclear cells (ex. EM_id0004_PrHudata_4dTensor.Rda) [5], Human NB-4 cell (ex. EM_id0008_Human_NB4_2D_All_Cel_512_4dTensor.Rds) [3], murine bone marrow derived-mast cells (ex. EM_id0009_MurineBMMC_All_512_4dTensor.Rds) [5], human blasts (ex. EM_id0010_HumanBlast_All_512_4dTensor.Rds) [5], and human T-cell line Jurkat (ex. EM_id0011_HumanJurkat_All_512_4dTensor.Rds) [5] were used.
For bio-tissue ultra-microstructures, electron microscopy-based imaging data of the mouse brain (ex. EM_id0001_Brain_CA1_hippocampus_region_5dTensor.Rda) [6,7], Drosophila brain (ex. EM_id0002_Drosophila_brain_region_5dTensor.Rda) [8,9], mouse kidney (ex. EM_id0005_Mouse_Kidney_2D_All_Nuc_1024_4dtensor.Rds) [10] and rat liver (ex. EM_id0006_Rat_Liver_2D_All_Mito_1024_4dtensor.Rds) [10] were used.
For cell tracking, light microscopy-based imaging data of the human HeLa cells on a flat glass (ex. LM_id0001_DIC_C2DH_HeLa_4dTensor.Rda) [11,12], human glioblastoma-astrocytoma U373 cells on a polyacrylamide substrate (ex. LM_id0002_PhC_C2DH_U373_4dTensor.Rda) [11,12] and GFP-GOWT1 mouse stem cells (ex. LM_id0003_Fluo_N2DH_GOWT1_4dTensor.Rda) [13] were used.
The values of the supervised labels were provided as array data with binary or multiple values. The detailed information was described in the metadata file of BioImageDbs. Some of cell tracking data were obtained from the cell tracking challenge.
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] ja_JP.UTF-8/ja_JP.UTF-8/ja_JP.UTF-8/C/ja_JP.UTF-8/ja_JP.UTF-8
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ExperimentHub_2.0.0 AnnotationHub_3.0.1 BiocFileCache_2.0.0
## [4] dbplyr_2.1.1 BiocGenerics_0.38.0 BiocStyle_2.20.2
##
## loaded via a namespace (and not attached):
## [1] Biobase_2.52.0 httr_1.4.2
## [3] sass_0.4.0 bit64_4.0.5
## [5] jsonlite_1.7.2 bslib_0.3.0
## [7] shiny_1.6.0 assertthat_0.2.1
## [9] interactiveDisplayBase_1.30.0 highr_0.9
## [11] BiocManager_1.30.16 stats4_4.1.0
## [13] blob_1.2.2 GenomeInfoDbData_1.2.6
## [15] tiff_0.1-8 yaml_2.2.1
## [17] BiocVersion_3.13.1 pillar_1.6.2
## [19] RSQLite_2.2.8 lattice_0.20-44
## [21] glue_1.4.2 digest_0.6.27
## [23] promises_1.2.0.1 XVector_0.32.0
## [25] htmltools_0.5.2 httpuv_1.6.3
## [27] pkgconfig_2.0.3 magick_2.7.3
## [29] bookdown_0.24 zlibbioc_1.38.0
## [31] purrr_0.3.4 xtable_1.8-4
## [33] fftwtools_0.9-11 jpeg_0.1-9
## [35] later_1.3.0 tibble_3.1.4
## [37] KEGGREST_1.32.0 EBImage_4.34.0
## [39] generics_0.1.0 IRanges_2.26.0
## [41] ellipsis_0.3.2 cachem_1.0.6
## [43] withr_2.4.2 magrittr_2.0.1
## [45] crayon_1.4.1.9000 mime_0.11
## [47] memoise_2.0.0 evaluate_0.14
## [49] fansi_0.5.0 tools_4.1.0
## [51] lifecycle_1.0.0 stringr_1.4.0
## [53] S4Vectors_0.30.0 locfit_1.5-9.4
## [55] AnnotationDbi_1.54.1 Biostrings_2.60.2
## [57] compiler_4.1.0 jquerylib_0.1.4
## [59] GenomeInfoDb_1.28.4 rlang_0.4.11
## [61] grid_4.1.0 RCurl_1.98-1.5
## [63] htmlwidgets_1.5.4 rappdirs_0.3.3
## [65] bitops_1.0-7 rmarkdown_2.11
## [67] abind_1.4-5 DBI_1.1.1
## [69] curl_4.3.2 R6_2.5.1
## [71] knitr_1.34 dplyr_1.0.7
## [73] fastmap_1.1.0 bit_4.0.4
## [75] utf8_1.2.2 filelock_1.0.2
## [77] stringi_1.7.4 Rcpp_1.0.7
## [79] vctrs_0.3.8 png_0.1-7
## [81] tidyselect_1.1.1 xfun_0.26